Rule base combined linguistics knowledge with corpus
نویسندگان
چکیده
This paper proposes a new approach to construction of rule bases for the transferredbased machine translation. In our approach, the rule bases are constructed in combination of the linguistics knowledge and large scale of corpora. On the one hand the lexical knowledge, the syntactic knowledge and the semantic knowledge are all used in the rules. on the other hand the knowledge is used for the statistics and self-learning of rules. In each rule base, all rules are scored and ranked. Thus an impersonal choice for the sentence can be made. The preliminary experimental results show that the approach may increase the speed to build the rule base and improve the quality of rules.
منابع مشابه
An Approach to Example-Based Machine Translator using Translation Memory
This paper presents example-based machine translation architecture using translation memory that integrates the use of examples for flexible, idiomatic translations with the use of linguistic rules for broad coverage and grammatical accuracy. In examplebased machine translation (EBMT) approach to machine translation is often characterized by its use of a bilingual corpus with parallel texts as ...
متن کاملA Corpus � Based Approach to Language Learning Eric Brill
A CORPUS BASED APPROACH TO LANGUAGE LEARNING Eric Brill Supervisor Mitchell Marcus One goal of computational linguistics is to discover a method for assigning a rich struc tural annotation to sentences that are presented as simple linear strings of words meaning can be much more readily extracted from a structurally annotated sentence than from a sentence with no structural information Also str...
متن کاملThe Multi-layer Language Knowledge Base of Chinese NLP
This paper introduced the effort to build a multi-layer knowledge base of Chinese NLP which combined with list-based, rule-based and corpus-based language information. Different kinds of information are designed to solve different kind of problems that encountered in the Chinese NLP. The whole knowledge base is designed with theoretical consistency and can easily be put into practice in the app...
متن کاملExploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources: Application on Arabic-French Comparable Corpora and Bilingual Lexicons
We present simple and effective methods for extracting comparable corpora and bilingual lexicons from Wikipedia. We shall exploit the large scale and the structure of Wikipedia articles to extract two resources that will be very useful for natural language applications. We build a comparable corpus from Wikipedia using categories as topic restrictions and we extract bilingual lexicons from inte...
متن کاملUsing Corpus Statistics and WordNet Relations for Sense Identification
Corpus-based approaches to word sense identification have flexibility and generality but suffer from a knowledge acquisition bottleneck. We show how knowledge-based techniques can be used to open the bottleneck by automatically locating training corpora. We describe a statistical classifier that combines topical context with local cues to ident~y a word sense. The classifier is used to disambig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003